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Abstract 

H.E.S.S is an array of atmospheric Cherenkov telescopes dedicated to GeV-TeV 7-ray as- 
tronomy. The original array has been in operation since the beginning of 2004. It is 
composed of four 12-meter diameter telescopes. The installation of a fifth 28-meter diam- 
eter telescope is being completed. This telescope will operate both in stereoscopic mode 
and in monoscopic mode i.e. without a coincident detection on the smaller telescopes. A 
second-level trigger system is needed to supress spurious triggers of the 28-meter telescope 
when operated in monoscopic mode. This paper gives the motivation and principle of the 
second-level trigger. The principle of operation is illustrated by an example algorithm. 
The hardware implementation of the second level trigger system of H.E.S.S. phase 2 is 
described and its expected performances are then evaluated. 



1. Introduction 

The H.E.S.S. (High Energy Stereoscopic System) instrument is an array of four imaging 
atmospheric Cherenkov telescopes working in stereoscopic mode. It is located in Namibia, 
in the Khomas Highland and has started its operations in 2004. Each of the present "small" 
Cherenkov telescopes (SCT) has a 12-meter diameter mirror and is equipped with a 960- 
pixel (photomultiplier) camera at its focal plane. It detects photons in the 100 GeV-50 
TeV energy range. In addition to photons showers, the combinatorial background 
from diffuse sky photons and charged cosmic showers can trigger the telescopes. 
Stereoscopy allows one to achieve a large rejection of the single muon triggers. These single 
muons come from very distant hadronic showers and dominate the single-telescope particle 
triggers [5|. The single muon trigger rate is discussed in section 12.21 The H.E.S.S. 
collaboration has started to build a fifth, 28-meter-diameter large Cherenkov telescope 
(LCT). The LCT will be equipped with a 2048-pixel camera at its focal plane. The LCT 
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will be sensitive to photons down to 10 GeV. In normal operation, the SCTs are triggered 
only in case of a coincidence with another telescope (LOT or SCT). However, the energy 
threshold of SCTs is too high to efficiently detect low energy (< 50 GeV) 7 rays. To 
increase its acceptance at low photon energies, the HESS instrument will have to accept 
standalone LOT triggers. Assuming similar first level trigger conditions on the 
SCTs and the LCT, these standalone LCT triggers would have a rate which is 
typically a factor of five larger than H.E.S.S stereoscopic triggers. As shown 
later in section \2.2\ these triggers are mostly background triggers. The H.E.S.S. 
collaboration has decided to build a second level (L2) trigger board in the camera of the 
LCT to improve the rejection of accidental night-sky background triggers and single muon 
triggers. The L2 trigger board is programmable, which gives flexibility in the choice of the 
trigger algorithms. For instance, low energy selection algorithms similar to the trigger used 
by the MAGIC collaboration to detect the pulsed emission from the Crab pulsar can be 
implemented. These low energy selection algorithms allow to lower the energy threshold 
on the LCT. Alternatively, at constant energy threshold, the gain in bandwidth obtained 
by rejecting the background events can be used to transfer timing informations on fired 
pixels to the acquisition farm, in addition to the total charge. The timing information may 
be useful for analyzing single telescope events, as has also been shown by the MAGIC 
collaboration jif. 

Topological triggers have been previously used on other Cherenkov instruments. For 
example, the MAGIC collaboration jsl uses a N-next-neighbor logic in its first level 
trigger and has designed a second-level trigger which can perform a rough event analysis 
and can apply topological cuts to the images. 

In the first part of this paper, the various contributions to the LCT instrument trigger 
rate are listed and evaluated. The next section is devoted to the L2 concept and an example 
L2 trigger algorithm is given. The actual L2 trigger board is described in section |H Finally, 
the on-board implementation of the L2 algorithm is discussed in section [51 

2. Level 1 trigger and trigger rates 

2.1. Level 1 trigger and contributions to the trigger rate 

The triggering of the H.E.S.S. phase 1 (HESS-1) instrument has been described in 
details in reference j5|. It operates in a two-step process. The first step (hereafter called 
"LI trigger") is a local camera trigger. It is a multiplicity trigger in overlapping sectors of 
64 pixels. A camera trigger occurs if the signals in M pixels within a sector (multiplicity 
threshold ) exceed a threshold of N photoelectrons (pixel threshold ). The effective time 
window for pixel coincidence is 1.3 ns. The second step is the so-called "Central Trigger". 
The Central Trigger system looks for coincidences of telescope triggers inside a 40 ns time 
window ("stereoscopic" events). The HESS-1 array is operated in stereoscopic mode. A 
coincidence of at least 2 telescopes is required in the Central Trigger time window. 

Data acquisition from the large telescope in the phase 2 of H.E.S.S. (HESS-2) will be 
triggered in three steps. The first step is a camera-level trigger similar to the LI trigger 
of HESS-1. The camera of the LCT has 96 overlapping trigger sectors. In addition to 
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time coincidences between SCT LI triggers, the Central Trigger System will check for time 
coincidences of LCT and SCT triggers. The result of the latter coincidence test (monoscopic 
or stereoscopic event) will be sent back to the LCT trigger management. As in HESS-1, 
stereoscopic events will always be accepted. In a third step, the LCT monoscopic events 
are accepted or rejected depending on the result of the L2 trigger system computations. 

The largest contributions to the trigger rate of single telescopes in 7-ray astronomy 
are background events. An important source of background events comes from 
the coincidental quasi-simultaneous firing of pixels by diffuse photons, the so- 
called Night Sky Background (NSB). The NSB originates in diffuse sources, such as 
the zodiacal light or unresolved light from the galactic plane, and light from bright stars. 
The NSB has been measured at the H.E.S.S. site and NSB photoelectron rates were derived 
for the SCT [9|. The expected NSB photoelectron rate is 100 ± 13 MHz per pixel at zenith 
in extragalactic fields. In galactic fields, the single pixel rate is higher and reaches 200-300 
MHz per pixel. The LCT has a larger collection area (596 m^ compared to 108 m^), but 
more pixels (2048 instead of 960) and a smaller angular acceptance (3. 10~^ sr instead of 
6. 10^'^ sr). The expected NSB rate per pixel of the LCT is thus expected to be larger by 
a factor 1.3. 

The other source of background is cosmic-ray showers. These showers are induced by 
interactions of cosmic hadrons (proton, helium) or electrons/positrons in the atmosphere. 
The typical proton flux above 3 GeV is 600 m~^ sr~^s~^. Isolated muons from distant 
hadron showers also trigger single Cherenkov telescopes. These muon triggers dominate 
the single telescope particle triggers jsl and are easily rejected by stereoscopic triggers. 

2.2. Particle trigger rates 

The outputs from the electronics channels of H.E.S.S. were simulated with realistic 
photomultiplier signal shapes and electronics readout [7]. The NSB LI trigger rate was 
simulated by adding random photoelectrons to every readout channel. Trigger rates caused 
by NSB single pixel photo-electron rates of 100, 200 and 300 MHz have been calculated. 
The corresponding LCT LI trigger rates are given in tabled] for several LI trigger condi- 
tions. Depending on conditions, the estimated rates range from several MHz to less than 
a few tens of Hz. Since the dead time per event of the LCT acquisition is of the order 
of a few microseconds, the acquisition rate should be no more than ~ 100 kHz. Table [1] 
shows that some LI trigger conditions (e.g. a pixel multiplicity of 3 with a pixel threshold 
of 3 and a NSB pixel photo-electron rate of 200 and 300 MHz) lead to unmanageably high 
trigger rates. 

The proton, muon, photon and electron showers were simulated with the KASKADE 
jsj program, using parameterizations given in reference |0] • These parametrisations are 
compared to cosmic ray measurements in reference [g]. The proton trigger rate is 
shown in figure [U function of the pixel threshold in photoelectrons. As stated in 

section [2?H a large fraction of the LI trigger rate is due to single distant muons. This can 
be seen by comparing figures [T] a) and b), the latter giving the total LI rate contributed by 
isolated muons. Cosmic ray electrons give a Cherenkov signal very similar to the signal of 
high energy gamma rays. It is thus not possible to eliminate the electron signal. Further, 
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the electron contribution becomes more important at low energy, since cosmic electrons 
have a very soft spectrum (with index ~ 3.3). However, the electron background, which is 
a diffuse source, can be somewhat reduced in point source studies. The electron LI trigger 
rate is plotted in figure El An electron rigidity cut-off of 7 GV was assumed j3] . The 
electron trigger rate is a few hundred Hz for typical LI trigger conditions. 
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Figure 1: a) Proton trigger rates versus pixel threshold (in photoelectrons). The pixel multiplicity is 4. b) 
The part of the trigger rate due to the isolated muon component of the shower. A LI pixel multiplicity of 
4 and a second level pixel threshold of 7 have been assumed. The dash-dotted line gives the raw LI rate. 
The solid line shows the rate of monoscopic events. The dashed line gives the rate of events passing the 
cleaning/neighboring pixel cut. Finally, the dotted line is the rate of events passing the center of gravity 
cut. Note that the center of gravity cut reduces the single muon rate by a factor 3. In this figure and 
figures [2] and [3] , the curves drawn are splines connecting the results from simulation. 

The total particle trigger rate is shown in figure |3] as a function of the pixel threshold. 
The particle trigger rate is the sum of the proton, the helium and the electron rate. The 
helium rate is approximately taken into account by multiplying the proton rate by 0.2 
The total particle trigger rate is of the order of 1 kHz for typical LI trigger conditions. 

3. Algorithms for the level 2 trigger system 

3.1. Requirements for the second level trigger 

As explained in section \2.2\ the input event rate to the L2 trigger system, equal to 
the LI rate, is limited by the dead-time of the front-end readout board to less than 
~ 100 kHz. The output rate of the L2 system is limited by the capacity of the Ethernet 
connection to the acquisition farm. The maximal output event rate cannot exceed a few 
(typically 3) kHz. Table [1] shows that the input rate condition sets a strong constraint on 
possible LI conditions. For "admissible" LI conditions (which fulfill the input event 
rate condition) , the NSB rate is strongly reduced except possibly when observing special 
regions of the sky (e.g the Galactic Center) with a large NSB. For these LI conditions, the 
total particle -|- NSB rate is at the level of a few kHz. A further reduction of this rate by a 
factor of 2 or 3 allows one to fulfill safely the L2 output rate condition even in very noisy 
environments. 
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Figure 2: Electron rate as a function of the pixel threshold. A LI pixel multiplicity of 4 and a second level 
pixel threshold of 7 have been assumed. The dash-dotted line gives the raw LI trigger rate. The solid 
line shows the rate of monoscopic events. These 2 lines are almost superimposed since the electron rate is 
dominated by low energy events. The dashed line gives the rate of events passing the cleaning/neighboring 
pixel cut. Finally, the dotted line gives the rate of events passing the center of gravity cut. 

3.2. Principle of the second level trigger 

The actual implementation of the L2 system is described in detail in section HI This 
section describes the ideas underlying its mode of operation. The L2 trigger system uses the 
pixel level information (as opposed to the sector level information used in the LI trigger) 
to trigger the LCT. A gray, 2-bit, image of the camera, called "combined map", is sent 
to the L2 system whenever the LCT has a LI trigger. The 2 bit values of the combined 
map correspond to 2 different values of the pixel threshold, the LI pixel threshold 6i and 
a second higher pixel threshold 62- The black and white maps obtained with these two 
thresholds are referred to as map^ and map2 respectively. The background rejection is 
achieved by running event filters, such as the one described in subsection 13.31 on mapi, 
map2 or the combined map. Since stereoscopic events are always accepted, the L2 trigger 
system operates differently on stereoscopic and monoscopic events. When a LI trigger of 
the LCT occurs, the Central trigger checks if another telescope was triggered. If this is the 
case, then one has a stereoscopic event, which is automatically accepted by the L2 system. 
If on the contrary the event is monoscopic, then it is accepted only if selected by the L2 
trigger event filter. 

3.3. Event filters 

The L2 trigger event filters can be divided into two broad classes: clustering/denoising 
and statistical sums over pixels. 
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Figure 3: Total hadronic+elcctron rate as function of the pixel threshold. A LI pixel multiplicity of 4 
and a second level pixel threshold of 7 have been assumed. The dash-dotted line gives the raw LI rate. 
The solid line shows the rate of monoscopic events. The dashed line gives the rate of events passing the 
cleaning/neighboring pixel cut. Finally, the dotted line is the rate of events passing the center of gravity 
cut. 



The clustering/denoising filters aim at removing the NSB contribution to the trigger 
rate. The denoising filters remove all the isolated pixels from mapi. If the resultant map 
is empty, then the event is rejected. There are several possible clustering algorithms. One 
possibility consists in simply demanding 2 or 3 neighboring pixel hits around a triggered 
pixel. The effect of the denoising/clustering on NSB is illustrated by tabled! The clustering 
algorithm asks for at least 2 triggered pixels neighboring at least one triggered pixel. The 
NSB trigger rates are seen to decrease by large factors, in some cases by several orders 
of magnitude (see e.g the trigger rates for a pixel threshold of 3 photoelectrons). The 
efficiency of the clustering/denoising event filter allows one to decrease slightly the LI 
trigger threshold and thus to reach a smaller photon energy threshold. For example, 
the clustering filter allows to use the (multiplicity, pixel threshold) = (3,4) LI 
trigger condition with a NSB trigger rate of less than 1 kHz. 

The proton, electron, and total particle rates are displayed in figures [H [2] and [31 These 
rates are little affected by the clustering cut. The electron rate is dominated by low energy 
events, so that most electron events will trigger only the LCT. The second class of filters: 
statistical sums over pixels, can be used to lower the charged cosmic ray background. 
These filters are run after the clustering filters and the removal of isolated 
pixels from mapi (denoising). Several algorithms are currently being investigated. In 
this paper, an example algorithm that can be used to reject a part of the charged particle 
background is described. This algorithm is valid for point sources or weakly extended 
sources of photons. It is based on two features of the photon signal. First, low energy 
photons should be detectable only at small impact distances of the center of the LCT. 
Second, for a given photon energy, there is a correlation between the impact distance of 
the shower and the position of the center of gravity of the image in the camera. On the 
contrary, the center of gravity of single muon events has a fiat distribution 
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over the camera. Thus a cut on the position of the center of gravity of the 
image in the camera will remove a fraction of the muon events proportional to 
the area that was cut off while keeping most of the low energy photon events. 
For illustrative purposes, we demand that the center of gravity of accepted 
showers be located at less than 1.75°/v^ = 1° from the expected position of 
the pointed photon source. This cut should remove 2/3 of the single muon 
showers. The rejection factor of general hadron showers is expected to be 
smaller because showers seen simultaneously by the LCT and one or several 
SCTs will be accepted. Figure [T]b) shows the rate of single muon triggers. The 
comparison to figure [1] a) shows that single muon triggers dominate the charged 
particle trigger rate. Roughly 80% of the muon triggers are monoscopic events, 
in agreement with figure 10 of reference [5]. Finally, the rate of monoscopic single 
muon events is reduced by a factor of three when the center of gravity cut is applied. The 
same reduction applies to the electron background as shown in figure [2J The combined 
effect of the cuts on the charged particle background is summarized in figure O As expected, 
the charged particle rate is reduced by a factor of roughly 3. 

The L2 cuts decrease the photon trigger efficiency. Figure S] shows the effect of the vari- 
ous L2 cuts on the photon efficiency. As the photon energy increases, the fraction of mono- 
scopic events (dot-dashed line) decreases. However, the fraction of stereoscopic events, 
which are automatically accepted by the L2 system, increases. The denoising/ neighboring 
pixel event filter (dotted line) removes a fraction (~ 15%) of the low energy ( < 20 GeV) 
photons. After the center of gravity cut (dashed line), around 80% of the low energy pho- 
tons pass the L2 trigger. The fraction of accepted events (solid line) decreases with energy, 
reaches a minimum of roughly 60% around 75 GeV, then increases again because of the 
increasing fraction of stereoscopic events. 

In summary, it is possible to efficiently remove the NSB background with a denoising / 
clustering algorithm on map^. The charged particle background can be handled by other, 
statistical, criteria such as the center of gravity algorithm mentioned above. Next, in sec- 
tionlH the hardware designed for the L2 trigger system is described. Then, section [5] reports 
on firmware and software co-design for the acceleration of the most intensive processing 
steps of the algorithm and provides experimental timing results. 

4. The Level 2 trigger reconfigurable hardware solution 

Depending on the objective of a given observation run, a different L2 selection algorithm 
may be preferred. The L2 trigger system therefore has to be reconfigurable, within certain 
limits. For the design of the L2 hardware it was assumed that the algorithm described in 
section 13.31 is representative of other candidate L2 algorithms. 

4.I. An original and cost effective hardware solution 

State-of-the-art technology at the beginning of the L2 trigger board design 
was provided by Xilinx's Virtex4-FX device : an FPGA (field programmable 
gate array) with an embedded 32-bit PowerPC (PPC) processor running at 
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Tabic 1: Night sky background rates. Upper limits are given at the 95% C.L. Upper tabic: Night Sky 
Background rates for various trigger conditions and NSB photoelectron rates. Lower table: effect 
of denoising and clustering. The clustering condition asks for at least 2 neighbors around at least one 
triggered pixel. The LI trigger rates which exceed the maximum LI rate of 100 kHz are shown 
in boldface. 
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Figure 4: L2 trigger efficiency vs shower energy in the case of filtering by a simple center of gravity 
algorithm. The efficiency is normalized to the LI photon efficiency. The dot-dashed line shows the 
fraction of monoscopic events. The dotted line and the dashed line show respectively the effect of the 
cleaning/nearest neighbor algorithm and the combined effect of the nearest neighbor and center of 
gravity algorithms. The L2 efficiency is the sum of the dashed line contribution and of the stereoscopic 
events. The fraction of events accepted by the L2 trigger is shown by the solid line. 



up to 300 MHz0. Designing with this platform provides the system with the 
required flexibihty on both firmware and software levels. The auxiliary proces- 
sor controller unit (APU) of the PPC allows the processor to externalize the 
execution of custom instructions to the FPGA fabric, while still using simple 
function calls in the software. This is a powerful tool for the acceleration of 
software as shown in section [51 

A novel non-standard solution was retained for the L2 trigger hardware, con- 
sisting in mounting several commercially available boards as mezzanines and a 
custom designed carrier board. After some preliminary studies using Avnet's 
Virtex-4 FX12 Evaluation Kii| (EB) and Avnet's Virtex-4 FX12 Mini-Modulti 
(MM) it appeared that these provided the necessary features needed for the L2 
trigger. These boards are distributed by FPGA manufacturers to encourage 
designers to develop new designs using their latest technology. Hence these 
evaluation boards are usually very cheap even though they off"er a complete 
hardware environment for a wide range of applications. In our case, EB was 
chosen because of the large number of available user I/O's directly connected 



^http: / /www. xilinx.com/support / documentation/user .guides 

^http://www. silica.com/services/engineering/design-tools/ads-xlx- v4fx-evll2-g.htm 
'^http://www. files. em. avnet.com/files/177/fxl2 jnini_module_user_guide_l_l.pdf 
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to the FPGA. Indeed, the two binary maps, sent to the L2 system by the front 
end (FE) on a Level 1 event, are most easily processed by the L2 algorithm 
if the whole data converges on a single FPGA. In practice, 64 LVDS links are 
needed to transfer the images of the 2048 camera pixels from 256 FE boards, 
grouped into 64 pairs of drawers {i.e. sub-crates) as shown in figure O The 
~ 120 single-ended cables and the 30 LVDS pairs available on the two 140 pin 
connectors (AvBus) on the EB provide the necessary connectivity. Another 
key feature of the EB is the Micron 32 megabytes DDR SDRAM which the 
PPC can address to hold code and data. The mini-module MM, despite its 
very small footprint (30 mm by 65.5 mm), also packages all the necessary func- 
tions needed for an embedded processor system. On this board, the Virtex-4 
FPGA is accessible through 76 user I/Os and is connected to 32M x 16 of DDR 
memory. Both boards hold a 100 MHz oscillator for clocking purposes. This 
solution greatly reduces the hardware design eff"orts and most of the firmware 
can be developed and tested on the evaluation boards concurrently. However, 
there are a few drawbacks such as coping with the circuit design and the non- 
standard mechanical format of the evaluation boards. Also the commercial life 
span of evaluation boards may be relatively short. 



4-2. Custom cPCI carrier board 

The mechanical standard of the H.E.S.S. LCT camera is the 6U Compact PCI standard 
lo| . The designed cPCI board is shown on the left of figure [5l carrying four MMs and 



one EB. The 6U rear I/O board, shown on the right of figure [5], is in charge of translating 
the 64 incoming LVDS pairs to unipolar signals and of forwarding these to the front side 
of the crate to the main L2 trigger board. The carrier board provides the mezzanines 
with the different voltage levels needed. Communication over the PCI bus is ensured by a 
PCI bridge located inside a Spartan SAN FPGA0. The latter also holds the necessary 
logic to communicate with the FPGAs on the mezzanines for data acquisition, 
slow control, FPGA configuration and software download by JTAG via PCI. 
Another important feature provided by Virtex4-FX FPGAs are the special logic blocks for 
high-speed serial connectivity called I/OSERDES which are available in all 10 tiles. In 
our design with multiple Virtex4-FX devices, this is crucial for FPGA interconnectivity. 
The EB is in charge of receiving the data from the FE, through the backplane. Additional 
information about the stereoscopic nature of the incoming event reaches the EB through 
the front panel. Fast serial links connect the EB to the MMs so that some data processing 
tasks can be exported to the other FPGAs available in the L2 trigger system. Four slots are 
reserved for the L2 system in the trigger crate of the LCT camera and fast serial links are 
included in the design for interslot data transfers through the backplane. With the current 
board design, the computing resources of up to 20 Virtex-4FX12 FPGAs are available for 



*http: / /www.xilinx.com/ support/documentation/ spartan-3an_user_guides.htm 
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Figure 5: View of the final L2 trigger cPCI carrier board equipped with an AVNET EB and 4 minimodules 
(on the left) and a rear 10 board for the conversion of 64 LVDS links from FE (on the right) . 



the implementation of the L2 trigger algorithm. 



5. Firmware and Software for the L2 trigger system 

Decisions relative to the design of the L2 trigger board were made based on the good 
timing results obtained while running an optimized implementation of the default L2 trigger 
algorithm on the Virtex4-FX evaluation boards. This insured simple portability of the 
algorithm onto the final L2 trigger hardware. This section reports on the acceleration 
of the most computationally intensive steps in the L2 algorithm namely the detection of 



clusters in the data and the computation of Hillas statistics Experimental timing 

results are given using a single L2 trigger board in two different configurations involving 
either one or five Virtex4-FX FPGAs. 



5.1. Deserializing and synchronizing the data 

The L2 trigger system is in charge of deserializing and synchronizing the data it receives 
from the FE and the central trigger following a LI trigger. A custom asynchronous serial 
protocol is used to transfer 4096 data bits from the FE to the L2 trigger system onto 64 
LVDS links. Each link is used to transfer 64 bits from 32 pixels on 4 adjacent FE boards as 
shown in figure [6l The data are sent as 4 words of 16 bits, each preceded by a start bit and 
two ID bits. Symbols last 45 ns and a gap of at least 270 ns separates two successive words 
of 19 bits each so that it takes 4.18 fis to transfer the data from the FE to the L2 trigger. 
The binary data are then stored in a pipeline as a 64 x 64 bit matrix. Concurrently, the 
central trigger informs the L2 of the stereoscopic nature of each LI event. This information 
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is stored in a different FIFO, along with the contents of several registers driven by slow 
control. The latter are parameters used in the trigger algorithm {e.g. target coordinates). 

The L2 trigger system as a whole is structured as a pipeline : it is compelled 
to provide the local trigger management module with its decisions to accept or 
reject LI events in the very order that these LI events occurred. This is due 
to the data acquisition FIFOs used to hold the camera data on the FE boards 
while awaiting the L2 trigger decision. Note that these FIFOs have a capacity 
of 50 events which sets an upper bound on the latency of the L2 system. 
The PPC on the EB is in charge of synchronizing the FE data and the central trigger data 
by checking the contents of the two BRAMs on its 64 bit wide local bus (PLB). There are 
different ways for the PPC to access the BRAMs in the FPGA fabric. However the best 
timing was obtained using a cacheable BRAM on the PLB. If the event to be processed 
is tagged as stereo, the L2 system issues an accept signal. Otherwise, the L2 trigger 
issues either an accept or a reject decision as an output of the following L2 
algorithm : 

1. Set to all pixels in mapi that are not in clusters of 3 at least — > mapi 

2. IF = THEN Reject ELSE 

3. Set to all isolated pixels in mapi — )■ inap^ 

4. Compute Hillas statistics of 6imap^ + {62 — 6i)map2 

5. Compute distance A from center of gravity cog to target {xc, He) 

6. IF A > Tcog THEN Reject ELSE Accept 

where as before mapi and map2 are the two input binary maps associated with threshold 
values 5i and 82, (xc are the pointed target's coordinates on the camera plane and Tcog 
is the decision threshold on the nominal distance A between the center of mass of the event 
maps and the target. 

5.2. Hardware acceleration of clustering and denoising 

A purely sequential software implementation of the clustering and denoising operators 
in steps 1 and 3 is clearly time-consuming and sub-optimal. We thus turned to a parallel 
hardware implementation : considering the binary nature of mapi, the non-linear filters 
involved are easily built using logic AND and OR gates. For instance, denoising is achieved 
by convolving mapi with a simple filter such that : 



where ij is used to index the 6 nearest neighbors of pixel i. A wider filter with a support 
extending to second neighbors is used to detect clusters of at least three pixels as shown 
in figure [6l In order to process 32 pixels from 4 adjacent FE boards in parallel, one needs 
the values of 58 first and second neighbor pixels from 12 neighboring FE boards as shown 
in figure [61 On the edges of the camera, one or more bytes are set to zero. 
We note here that filtering map-^^ is a local operation. For this reason, the L2 data pipeline 




(1) 



j=1..6 
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Figure 6: Left : Each rectangle represents one of the 64 pairs of drawers (i.e. sub-crates) which compose 
the camera of the LCT. Each pair of drawers contains 4 front end boards and each EE board carries 8 
PMs. The 2048 pixels of the camera are on an equilateral triangular grid. However, for simplicity, each 
PM has integer coordinates in the non orthonormal set of axes shown. Middle : Local integer coordinate 
frames for a simple computation of Hillas parameters from a single EE board, and then from 4 EE boards 
in the same pair of drawers. Right : Eirst and second neighbors to the 32 PMs on the 4 EE boards 
from the same pair of drawers : there are 26 and 32 first and second neighbors to take into account when 
applying the denoising and clustering filters. 



reorganizes the input 64 x 64 binary data matrix to provide local access to mapi and mapg- 
This way, each 32-bit word read by the PPC gives the binary values of 32 pixels from one 
of the 64 groups of 4 adjacent FE boards of the LCT camera and each byte maps to one 
FE board, as shown in figure [61 

In practice, there are different possibilities for connecting the proposed logic filter to the 



PPC. The best timing results were obtained using the APU [12| : experimentally, denoising 
and 3 pixel cluster detection take ~ 7500 PPC clock cycles with using a custom peripheral 
on the PLB while it takes only ^ 6200 cycles using the APU. 

5.3. Fast computation o/l''* and 2'^'^ order moments 

Computing the first and second order moments of the denoised combined map is a com- 



mon preliminary for the estimation of Hillas statistics and other parameters of interest [11 
Let us define : 



17lxy 



= ^ rriiXi ruy = J^- niiyi (2) 

i 

= rriixl ruyy = ^ . niiyf (3) 

i 

2^ rUiXiyi m = Y^.mi (4) 
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where i indexes the 2048 pixels in the processed data maps, and is the weight assigned 
to pixel i. Again, a fully sequential algorithm to compute these quantities is excluded. 
Fortunately, a faster computation is possible thanks to the structure of the 64 x 64 binary 
data matrix described in the previous section and to the linearity of the above quantities 
with respect to the pixel weights. Hence the binary maps rnap^ and map2 can be processed 
separately and the sums are profitably rearranged for an efficient hierarchical computation 
of the moments. First a byte-addressable look-up table (LUT) is used to compute the 
desired statistics on each FE board in a local coordinate frame attached to the 8 pixels on 
a single FE board as shown in figure El The LUT outputs are then combined locally to 
compute these statistics for the four FE boards in each of the 64 pairs of drawers. This 
local summation requires the LUT outputs to be properly translated depending on the 
position of a given FE board in the current pair of drawers. As shown in figure [6], a dif- 
ferent coordinate frame is used when handling the 32 pixels in a pair of drawers. Finally, 
summation over the 64 pairs of drawers requires an additional transformation of these 
partial statistics to account for the translation and scaling of the local frame required to 
move a pair of drawer to its correct position within the global coordinate frame of the 
camera. In the end, the contributions of the two binary maps are linearly combined with 
weights 6i and 62 — Si providing final 32 bit integer statistics ruyy, ruxy, ruxx, my, rux and 
m for the combined map. With this fast implementation, the PPC computes the first 
and second order moments of the input data in a maximum of 18000 clock cycles. For a 
faster execution time, given that these statistics will most often be estimated for low en- 
ergy events when only very few pixels are high in fnap^ and even less in mapg, it is worth 
checking if a byte is zero prior to computing its contribution to the statistics. As a result 
the computation time will vary almost linearly with the number of active bytes in the data. 

The algorithm proposed in section 13.31 uses only the first-order statistics to compute 
the nominal distance obviously in finite precision : 

A = Ji^-Vcr + ^xi—'X^r (5) 
V m m 

where the factor 3 is due to the equilateral triangular grid and the accompanying \/3 
left out in the moment computation for simplicity. The specified precision on the target 
coordinates is 1/32**^ of the unit length, motivating the precision to which the center of 
gravity (cog) coordinates have to be computed. Finally, thresholding the squared nominal 
distance avoids the costly computation of a square root. 

5.4- Experimental timing results 

The accelerated blocks described above (leaving out second order moments) were read- 
ily integrated into a fully functional L2 trigger system running on the specifically designed 
hardware. In practice, two simple architectures were tested for preliminary timing exper- 
iments in order to benchmark the performance of the L2 trigger system. The first design 
implements the full default algorithm inside a single FPGA. The second design uses the 
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five Virtex4-FX FPGAs available on one L2 trigger board. In this case, the Virtex4 on the 
EB receives the data from the FE, forwards it to the MMs and reads back their decisions. 
Events are assigned to MMs in a simple round robin procedure. Each MM is in charge 
of executing the complete decision algorithm. In this simple design, there is no pipelining 
of events in the MMs. For these experiments, an additional evaluation board is used to 
emulate the L2 trigger's environment. This testbench is capable of generating LI trigger 
events periodically at a specified rate, as well as bursts of specified lengths. 

Consider first the unrealistic worst case scenario where all events are mono- 
scopic with all pixels high above the higher threshold 62 resulting in the longest 
processing time. With this setup a stable behavior of the first single FPGA system was 
observed up to a maximum LI rate slightly above 10 kHz. The second multi-FPGA system 
could sustain a maximum mean rate close to 30 kHz. In fact, the additional communication 
tasks between the EB and MMs in the multi-FPGA system are in part responsible for the 
less than four-fold gain. Adding more MMs to the system will only increase the maximum 
acceptable LI rate to the point where the slowest concurrent process in the pipeline can 
handle it, which is ~ 60 kHz in the current multi-FPGA design. 

The above estimates are clearly highly pessimistic as they are based on unrealistic 
data. A more realistic estimation of the PPG occupancy is plotted in figure [7] for the first 
implementation of the L2 trigger system. This estimate is based on the simulated statistical 
distribution of input events reported in section [2] and on time measurements of the different 
elementary steps in the L2 trigger pipeline. For typical LI trigger conditions - multiplicity 
set to 3 or 4 and pixel threshold between 3 and 7- the estimated average processing time is ~ 
37 /is which corresponds to a maximum mean LI rate of 27 kHz. Actually, for these trigger 
conditions, the system occupancy is estimated < 20 % as shown in figure [71 The mean 
latency of the L2 trigger in its single FPGA implementation is the average time 
spent by an event inside the L2 pipeline. This includes steps before {i.e. data 
reception and data matrix transposition ) the sequential processing by the PPC 
and after {i.e. transmission of L2 decision). Experimentally, we measured the 
average L2 latency to be ^ 38 fis. The proposed multi-FPGA system will obviously 
provide a larger safety margin in terms of PPG occupancy as well as a shorter latency. In 
all cases the definite real LI rate will have to be determined on site. 

6. Conclusion 

This paper describes the design and implementation of the L2 trigger system for the 
second phase of the H.E.S.S. experiment. The L2 trigger will be used to reject night 
sky background related and isolated muon events and thus reduce the trigger rate. The 
principle of the trigger is to build a 2-bit ("combined") map of the camera pixels at the 
time of trigger. The night sky background events can then be rejected by demanding 
clusters of pixels on the combined map. Further rejection of the hadronic background can 
be obtained by using quantities such as the center of gravity of the triggered pixels. A 
possible, illustrative, algorithm for the L2 trigger system has been given in section [31 This 
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Pixel Threshold [ph.e] 

Figure 7: Estimated deadtime of the PPC in the single FPGA implementation of the L2 trigger system. 
The continuous and dashed lines correspond to LI pixel mulitiplicities equal to 3 and 4 respectively. 



example algorithm shows that the required rejection of night sky background and isolated 
muon triggers is achievable. 

The hardware and software integration into the LCT camera of the previously described 
system based on a single Virtcx-4FX12 FPGA has been achieved. The L2 system still needs 
to be fully integrated in the H.E.S.S. acquisition and tested with real data. This will be 
achieved at the beginning of the HESS-2 phase. 

Acknowledgments 

The authors are grateful to D. Bcsin and H. Zaghia for the layout of the cPCI board, to 
P. Nayman for providing the firmware for the PCI bridge and to the anonymous reviewers 
for their valuable comments and suggestions. 

References 

[1] E.Aliu et al, Science, 322,1221 (2008) 

[2] E.Ahu et al, Astropart Phys. 30, 293 (2009) 

[3] D.Bastieri et al, Nucl. lustrum. Meth. A461, 521 (2001) 

[4] J. Cortina, J.C.Gonzalez, Astropart. Phys. 15, 203 (2001) 

[5] S.Funk et al, Astropart. Phys. 22, 285 (2004) 

[6] J.Guy, P.Vincent, J-P. Tavernet & M.Rivoal, Astropart. Phys.17, 409 (2002) 



16 



[7] J.Guy, PhD. thesis, Universite Pierre et Marie Curie, (2003) 
[8] M.P. Kertzman & G.H Sembroski, Nucl. Instrum.Meth. A343, 629 (1994) 
[9] S.Preuss et al, Nucl.Instrum.Meth.A481,229 (2002) 
[10] CompactPCI Specification - PICMG 2.0 R3.0 (1999) 
[11] A.M.Hillas, Space Science Reviews, 75, 17 (1996) 
[12] H.H.Ng and L.Pillai, Xihnx Apphcation Notes, 717 (2005) 



17 



